home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Internet
/
Collection of Internet.iso
/
infosrvr
/
dev
/
www_talk.930
/
000086_timbl _Tue Apr 14 12:00:05 1992.msg
< prev
next >
Wrap
Internet Message Format
|
1994-01-24
|
3KB
Return-Path: <timbl>
Received: by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
id AA21572; Tue, 14 Apr 92 12:00:05 GMT+0200
Date: Tue, 14 Apr 92 12:00:05 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9204141000.AA21572@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: wei@sting.berkeley.edu (Pei Y. Wei)
Subject: HTML printing: Conversion HTML->LaTeX->dvi->Postscript
Cc: www-talk@nxoc01.cern.ch
> Does there exists something like a HTML to postscript/troff/* converter?
> I'm looking for something better than ``www -n foo.html > lpr''.
Here's a simple html to latex converter using "sed". It's not complete, but it
produces reasonable results on the W3 documentation, so I can now (at last) make a
W3 book. (A minor problem is that sed ignores any characters at the end of a file
which are not followed by a final newline, and the NeXT editor sometimes generates
HTML without the final newline.)
You have to prepend the document style you want to the output of sed. My makefile
looks like
echo " \\\\batchmode \\\\documentstyle{book}" > the_www_project.tex
sed -f html2latex.sed $(THE_HTML) >> the_www_project.tex
latex the_www_project.tex
For a large book, I concatenate several html files, passing some of them through
another sed file which removes the <TITLE> elements and demotes the <H1> to <H2>
etc. The file below italicises anchors, but in general it might be best to remove
them altogether. The smartest thing would be to generate the TeX to make a little
superscript reference to the page number to which a link refers. Any TeX experts
out there?
I'll put the "W3 Book" in postscript up for anonymous FTP shortly.
Tim BL
__________________________________________ html2latex follows
1i\
\\begin{document}
$a\
\\end{document}
/<XMP>/,/<.XMP>/b lit
/<.XMP>/b lit
/<xmp>/,/<.xmp>/b lit
/<.xmp>/b lit
/s?&.?\\&?g
s?>.?>?g
s?<.?<?g
s?\\?\\backslash ?g
s?{?\\{?g
s?}?\\}?g
s?%?\\%?g
s?\$?\\$?g
s?&?\\&?g
s?#?\\#?g
s?_?\\_?g
s?~?\\~?g
s?\^?\\^?g
s?<TITLE>?\\author{Generated from the Hypertext}\\title{?g
s?</TITLE>?}\\maketitle ?g
s?<ADDRESS>??g
s?</ADDRESS>??g
s?<P>?\\par?g
s?<p>?\\par?g
s?<Hn>?\\part{?g
s?</Hn>?}?g
s?<H1>?\\chapter{?g
s?</H[0-9]>?}?g
s?<H2>?\\section{?g
s?<H3>?\\subsection{?g
s?<H4>?\\subsubsection{?g
s?<H5>?\\paragraph{?g
s?<H6>?\\subparagraph{?g
s?<UL>?\\begin{itemize}?g
s?</UL>?\\end{itemize}?g
s?<LI>?\\item ?g
s?<ul>?\\begin{itemize}?g
s?</ul>?\\end{itemize}?g
s?<li>?\\item ?g
s?<DL>?\\begin{description}?g
s?</DL>?\\end{description}?g
s?<DT>?\\item[?g
s?<DD>?]?g
s?<dl>?\\begin{description}?g
s?</dl>?\\end{description}?g
s?<dt>?\\item[?g
s?<dd>?]?g
s?<NEXTID[^>]*>??g
s?<A[^>]*>?\\it ?g
s?</A>?\\/\\rm ?g
: lit
s?<XMP>?\\begin{verbatim}?g
s?</XMP>?\\end{verbatim}?
s?<xmp>?\\begin{verbatim}?g
s?</xmp>?\\end{verbatim}?